Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Controller: build assert if comand buffer is too small #19869

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KyraLengfeld
Copy link
Contributor

Host Number of Completed Packets command does not follow normal flow control of HCI commands and the Controller side HCI drivers that allocates HCI command buffers with K_NO_WAIT can end up running out of command buffers.

Host will generate up to acl_pkts number of Host Number of Completed Packets command plus a number of normal HCI commands.

Normal HCI commands follow the HCI command flow control using Num_HCI_Command_Packets return in HCI command complete and status.

When Controller to Host data flow control is supported, this commit ensures that BT_BUF_CMD_TX_COUNT is greater than or equal to (BT_BUF_ACL_RX_COUNT + Ncmd), where Ncmd is the supported maximum Num_HCI_Command_Packets.

Note: The SDC controller (currently) does not support Num_HCI_Command_Packets > 1, which means Ncmd is always 1.

@github-actions github-actions bot added doc-required PR must not be merged without tech writer approval. changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. labels Jan 13, 2025
@NordicBuilder
Copy link
Contributor

NordicBuilder commented Jan 13, 2025

CI Information

To view the history of this post, clich the 'edited' button above
Build number: 14

Inputs:

Sources:

sdk-nrf: PR head: a4c48d3d2cbda3d914ae0712df39f9a79c066f44

more details

sdk-nrf:

PR head: a4c48d3d2cbda3d914ae0712df39f9a79c066f44
merge base: 6e4a44d1204510e78ac54bac2006af9ddbf36a25
target head (main): 115359d6f2e3970e2bf5a17fefb2608c86a3917e
Diff

Github labels

Enabled Name Description
ci-disabled Disable the ci execution
ci-all-test Run all of ci, no test spec filtering will be done
ci-force-downstream Force execution of downstream even if twister fails
ci-run-twister Force run twister
ci-run-zephyr-twister Force run zephyr twister
List of changed files detected by CI (12)
CODEOWNERS
applications
│  ├── connectivity_bridge
│  │  │ prj.conf
doc
│  ├── nrf
│  │  ├── releases_and_maturity
│  │  │  │ known_issues.rst
scripts
│  ├── ci
│  │  │ tags.yaml
subsys
│  ├── bluetooth
│  │  ├── controller
│  │  │  ├── CMakeLists.txt
│  │  │  ├── hci_internal.c
│  │  │  ├── hci_internal_wrappers.c
│  │  │  │ hci_internal_wrappers.h
tests
│  ├── subsys
│  │  ├── bluetooth
│  │  │  ├── controller
│  │  │  │  ├── CMakeLists.txt
│  │  │  │  ├── prj.conf
│  │  │  │  ├── src
│  │  │  │  │  │ hci_cmd_cb_host_buffer_size_test.c
│  │  │  │  │ testcase.yaml

Outputs:

Toolchain

Version: 342151af73
Build docker image: docker-dtr.nordicsemi.no/sw-production/ncs-build:342151af73_912848a074

Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped; ⚠️ Quarantine

  • ◻️ Toolchain - Skipped: existing toolchain is used
  • ✅ Build twister
    • sdk-nrf test count: 975
  • ✅ Integration tests
    • ✅ desktop52_verification
    • ✅ test_ble_nrf_config
    • ✅ test-fw-nrfconnect-ble_samples
    • ✅ test-fw-nrfconnect-chip
    • ✅ test-fw-nrfconnect-zigbee
    • ✅ test-sdk-find-my
    • ✅ test-sdk-sidewalk
Disabled integration tests
    • doc-internal
    • test-fw-nrfconnect-apps
    • test-fw-nrfconnect-ble_mesh
    • test-fw-nrfconnect-boot
    • test-fw-nrfconnect-fem
    • test-fw-nrfconnect-nfc
    • test-fw-nrfconnect-nrf-iot_lwm2m
    • test-fw-nrfconnect-nrf-iot_mosh
    • test-fw-nrfconnect-nrf-iot_positioning
    • test-fw-nrfconnect-nrf-iot_samples
    • test-fw-nrfconnect-nrf-iot_serial_lte_modem
    • test-fw-nrfconnect-nrf-iot_thingy91
    • test-fw-nrfconnect-nrf-iot_zephyr_lwm2m
    • test-fw-nrfconnect-nrf_crypto
    • test-fw-nrfconnect-ps
    • test-fw-nrfconnect-rpc
    • test-fw-nrfconnect-rs
    • test-fw-nrfconnect-tfm
    • test-fw-nrfconnect-thread
    • test-low-level
    • test-sdk-audio
    • test-sdk-dfu
    • test-sdk-mcuboot
    • test-sdk-pmic-samples
    • test-sdk-wifi
    • test-secdom-samples-public

Note: This message is automatically posted and updated by the CI

@NordicBuilder
Copy link
Contributor

You can find the documentation preview for this PR at this link.

Note: This comment is automatically posted by the Documentation Publish GitHub Action.

@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch from 7e4cf76 to 7dd9904 Compare January 13, 2025 12:37
@KyraLengfeld KyraLengfeld marked this pull request as ready for review January 13, 2025 12:37
@KyraLengfeld KyraLengfeld requested review from a team as code owners January 13, 2025 12:37
@KyraLengfeld KyraLengfeld requested a review from cvinayak January 13, 2025 12:37
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch 2 times, most recently from e9664c2 to 5fabd4a Compare January 13, 2025 16:25
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
doc/nrf/releases_and_maturity/known_issues.rst Outdated Show resolved Hide resolved
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch from 5fabd4a to b1c56cf Compare January 14, 2025 08:53
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch 2 times, most recently from 0459e3f to c1ceef7 Compare January 17, 2025 15:09
* approach is to align the communicated host RX Buffer Count to it here.
*/
#if defined(CONFIG_BT_HCI_ACL_FLOW_CONTROL)
static int sdc_hci_cmd_cb_host_buffer_size_wrapper(void *cmd_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static int sdc_hci_cmd_cb_host_buffer_size_wrapper(void *cmd_params)
static int sdc_hci_cmd_cb_host_buffer_size_wrapper(const sdc_hci_cmd_cb_host_buffer_size_t *cmd_params)

ctrl_cmd_params.host_total_num_acl_data_packets = MIN(
ctrl_cmd_params.host_total_num_acl_data_packets, (CONFIG_BT_BUF_CMD_TX_COUNT - 1));

return sdc_hci_cmd_cb_host_buffer_size((const void *)&ctrl_cmd_params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return sdc_hci_cmd_cb_host_buffer_size((const void *)&ctrl_cmd_params);
return sdc_hci_cmd_cb_host_buffer_size(&ctrl_cmd_params);

* The SDC controller (currently) does not support Num_HCI_Command_Packets > 1, which means Ncmd
* is always 1.
*/
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < CONFIG_BT_BUF_CMD_TX_COUNT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move it inside sdc_hci_cmd_cb_host_buffer_size_wrapper, so it all in one place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that, when building Controller-only builds BT_BUF_ACL_RX_COUNT is always 0! Hence, need to use a count that is allocated in the Controller?

https://github.com/nrfconnect/sdk-zephyr/blob/9471a136b58bfa6853b4854308ca4429ba670934/include/zephyr/bluetooth/buf.h#L91-L107

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not important how many buffers are allocated inside controller, it still can send out more RX packets than that, if host delays HCI_Host_Number_Of_Completed_Packets. What is important is how many the transport can accept from the controller? Or if transport does not buffer, but just pass-through then checking against 0 here should be fine since it always pass?

* The SDC controller (currently) does not support Num_HCI_Command_Packets > 1, which means Ncmd
* is always 1.
*/
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < CONFIG_BT_BUF_CMD_TX_COUNT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that, when building Controller-only builds BT_BUF_ACL_RX_COUNT is always 0! Hence, need to use a count that is allocated in the Controller?

https://github.com/nrfconnect/sdk-zephyr/blob/9471a136b58bfa6853b4854308ca4429ba670934/include/zephyr/bluetooth/buf.h#L91-L107

@KyraLengfeld KyraLengfeld requested a review from a team as a code owner January 21, 2025 16:28
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch from eba935e to eb14af1 Compare January 21, 2025 17:50
Copy link
Contributor

@nordic-piks nordic-piks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at test tags and scripts/ci/tags.yaml, it look good.

@cvinayak cvinayak dismissed their stale review January 27, 2025 09:17

Dismissed as my requests are covered... will wait for upmerge and review again.

@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch from eb14af1 to 5efa3f8 Compare January 27, 2025 15:08
@KyraLengfeld KyraLengfeld requested a review from a team as a code owner January 27, 2025 15:08
@github-actions github-actions bot removed the changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. label Jan 27, 2025
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch 3 times, most recently from 87c9b92 to 134e520 Compare January 27, 2025 15:33
When Controller to Host data flow control is supported, this commit
ensures that BT_BUF_CMD_TX_COUNT is greater than or equal to
(BT_BUF_ACL_RX_COUNT + Ncmd), where Ncmd is the supported maximum
Num_HCI_Command_Packets, for a controller + host, or host-only build.
Furthermore this commit restricts the `host_total_num_acl_data_packets`
communicated to the controller to how many acknoledgements the
controller is able to receive from the host, in a controller-only build.

Note: Host Number of Completed Packets command does not follow normal
flow control of HCI commands and the Controller side HCI drivers that
allocates HCI command buffers with K_NO_WAIT can end up running out of
command buffers. Host will generate up to acl_pkts number of Host Number
of Completed Packets command plus a number of normal HCI commands.
Normal HCI commands follow the HCI command flow control using
Num_HCI_Command_Packets return in HCI command complete and status.

Note: The SDC controller (currently) does not support
Num_HCI_Command_Packets > 1, which means Ncmd is always 1.

Note: BT_BUF_CMD_TX_COUNT is defined as:
Number of buffers available for outgoing HCI commands from the Host.
BT_BUF_ACL_RX_COUNT is defined as:
Number or incoming ACL data buffers sent from the CTRL to the Host.

Simplified we avoid following situation:
(order not relevant,
 ex. BT_BUF_ACL_RX_COUNT = 2, BT_BUF_CMD_TX_COUNT = 2):

host				|			SDC

|acl packet 1|		----------------->	|BT_BUF_CMD_TX_ 1|

|acl packet 2|		----------------->	|BT_BUF_CMD_TX_ 2|

|Ncmd|			----------------->	No command buffer left.
						IPC expects message was
						received, message lost.

Signed-off-by: Kyra Lengfeld <[email protected]>
This commit extracts the wrapper function in its own file, to avoid
bloating the hci_internal file, as well as tests its funcionality in a
unit test.

Signed-off-by: Kyra Lengfeld <[email protected]>
@KyraLengfeld KyraLengfeld force-pushed the cmd_buffer_workaround branch from 134e520 to a4c48d3 Compare January 28, 2025 11:21
@KyraLengfeld KyraLengfeld requested review from nordic-auko and a team as code owners January 28, 2025 11:21
@@ -75,3 +75,6 @@ CONFIG_HW_ID_LIBRARY=y
CONFIG_RESET_ON_FATAL_ERROR=y
CONFIG_ASSERT=y
CONFIG_POWEROFF=y

# BT
CONFIG_BT_BUF_CMD_TX_COUNT=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to the .conf files in boards and be set along with the rest of the BT options

@@ -0,0 +1,8 @@
tests:
bluetooth.controller:
platform_allow: native_sim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform_allow section should be typed like list.

Suggested change
platform_allow: native_sim
platform_allow:
- native_sim

Comment on lines 898 to 899
BUILD_ASSERT(CONFIG_BT_BUF_CMD_TX_COUNT - 1 > 0,
"We need at least two HCI command buffers to avoid deadlocks.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, this wants BT_HCI_RAW, not !CONFIG_BT_HCI_HOST.

* is always 1.
*/
#if defined(CONFIG_BT_CONN) && defined(CONFIG_BT_HCI_HOST)
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < CONFIG_BT_BUF_CMD_TX_COUNT,
Copy link
Contributor

@alwa-nordic alwa-nordic Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIG_BT_BUF_CMD_TX_COUNT when CONFIG_BT_HCI_HOST is very different from when BT_HCI_RAW. (Yes, it's unfortunate and confusing that it's overloaded like this.)

When BT_HCI_RAW, this config controls the capacity of bt_buf_get_tx(BT_BUF_CMD), and it's very clear what the only use of this capacity is. It used only to receive a commands from the Host, and the rate of these commands are controlled by the command flow control. This is usable to compute a statically known worst case buffer use, guaranteeing a K_NO_WAIT allocation. See my comment on documenting assumptions below.

OTOH, when CONFIG_BT_HCI_HOST, this config controls controls the capacity of bt_hci_cmd_create, which is used for anything and everything. That can include many concurrent calls from the application. This is not amendable to statically compute any guarantees in the general case. If we make an additional assumption that whole application and host is only ever allocating one command, and waiting for its completion before allocating the next one, we get an analogous guarantee that bt_hci_cmd_create never blocks. But what problem does that guarantee solve? bt_hci_cmd_create is always K_FOREVER. This also needs to be documented. I suspect this solves a deadlock, right? I'll post more if I deduce it.

* is always 1.
*/
#if defined(CONFIG_BT_CONN) && defined(CONFIG_BT_HCI_HOST)
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < CONFIG_BT_BUF_CMD_TX_COUNT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < CONFIG_BT_BUF_CMD_TX_COUNT,
BUILD_ASSERT(BT_BUF_ACL_RX_COUNT < BT_BUF_CMD_TX_COUNT,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant CONFIG_BT_BUF_CMD_TX_COUNT in the comment above, will make it more clear.

* As we do not propagate the TX command buffer count information to the controller, the simplest
* approach is to align the communicated host RX Buffer Count to it here.
*/
static int sdc_hci_cmd_cb_host_buffer_size_wrapper(void *cmd_params)
Copy link
Contributor

@alwa-nordic alwa-nordic Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does indeed form a guaranteed K_NO_WAIT allocation in HCI samples, with some added assumptions which need to be documented:

  • a) The controller ncmd is always 1 (or 0 or the host pretends it is).
  • The controller frees cmd buffer strictly before sending cmd complete/status.
  • There is no other command than Host Num Complete that ignores flow control.
  • The Host does not send "empty" Host Num Completed messages. (The spec allows zeros in both the number of handles and number of ACKs for that handle.)
  • (The check added by this PR.) The Controller does not send more ACL than it has room for Host Num Complete messages. (Because of (a), all but one pool in the cmd_tx_pool is always free for Host Num Complete).
  • The Controller frees the Host Num Complete buffer strictly before it considers the credits therein as obtained and allowing it to send more data.

Consider if some of these assumptions should be turned into specification by adding them to the Zephyr HCI interface specification include/zephyr/drivers/bluetooth.h.

@shanthanordic
Copy link

Discussed with @guwa , He says If @rugeGerritsen approves, then it is good to go, so I will just approve for scope part not the technical details.

@KyraLengfeld KyraLengfeld marked this pull request as draft January 29, 2025 08:11
@KyraLengfeld
Copy link
Contributor Author

As there are some discussions going on regarding this, converting to draft for now. Once I open it again, I will update according to last change requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required PR must not be merged without tech writer approval.
Projects
None yet
Development

Successfully merging this pull request may close these issues.